Skip to content

fix: disconnect peers after pong timeout#424

Open
xdustinface wants to merge 1 commit intov0.42-devfrom
fix/disconnect-after-pong-timeout
Open

fix: disconnect peers after pong timeout#424
xdustinface wants to merge 1 commit intov0.42-devfrom
fix/disconnect-after-pong-timeout

Conversation

@xdustinface
Copy link
Collaborator

@xdustinface xdustinface commented Feb 8, 2026

Allow other peers to come in instead. I think the timeout of 60s is enough to get rid of them if they fail to pong.

Summary by CodeRabbit

  • Bug Fixes
    • Improved detection and disconnection of unresponsive peers
    • Enhanced cleanup of expired peer connection states
    • Better peer lifecycle management during network operations

Allow other peers to come in instead. I think the timeout of 60s is enough to get rid of them if they fail to pong.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

The changes enhance peer ping management in the network maintenance loop. The cleanup_old_pings method in the peer module now returns the count of expired pings, and the manager uses this count to determine which peers have expired pings and disconnects them, while also handling per-peer ping failures by updating reputation.

Changes

Cohort / File(s) Summary
Peer Ping Management
dash-spv/src/network/peer.rs
Modified cleanup_old_pings method to return a usize count of removed expired pings, with logic to collect expired nonces, remove them, log warnings, and return the count.
Network Manager Maintenance Loop
dash-spv/src/network/manager.rs
Enhanced maintenance loop to attempt sending pings to peers, handle ping failures with reputation updates, and use the count returned from cleanup_old_pings to identify and disconnect peers with expired pings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A ping here, a pong there,
Hopping through the network air,
Old pings cleaned, the count's now plain,
Peers expire in the loop's domain,
Tighter management, connections sane!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing peer disconnection after ping/pong timeout, which is the core objective of the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/disconnect-after-pong-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dash-spv/src/network/manager.rs`:
- Around line 890-893: The maintenance loop calls pool.remove_peer(&addr) on
ping-timeout peers but doesn't decrement the cached connected_peer_count or emit
network events, causing counter drift and stale event state; modify the
maintenance loop closure to capture network_event_sender, and after successful
removal invoke the same cleanup path used in start_peer_reader: call
connected_peer_count.fetch_sub(1, Ordering::SeqCst) (or the existing decrement
helper), send NetworkEvent::PeerDisconnected with the peer id and then send
NetworkEvent::PeersUpdated so consumers stay in sync; ensure you reference
pool.remove_peer, connected_peer_count, network_event_sender, and mirror the
logic from start_peer_reader to avoid double-removal.

Comment on lines +890 to 893
for addr in peers_to_disconnect {
log::warn!("Disconnecting peer {} - ping timeout", addr);
pool.remove_peer(&addr).await;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

connected_peer_count is never decremented for ping-timeout disconnections, causing counter drift.

When pool.remove_peer is called here, the peer reader task will find the peer gone from the pool (Line 354) and break, but since remove_peer already returned None at that point (Line 637), the fetch_sub on connected_peer_count (Line 640) is skipped. Over time the cached counter will overcount connected peers, breaking is_connected() and peer_count().

You should also emit NetworkEvent::PeerDisconnected / PeersUpdated here to keep event consumers in sync, matching the cleanup in start_peer_reader (Lines 638–654).

Proposed fix
                 for addr in peers_to_disconnect {
                     log::warn!("Disconnecting peer {} - ping timeout", addr);
-                    pool.remove_peer(&addr).await;
+                    if pool.remove_peer(&addr).await.is_some() {
+                        connected_peer_count.fetch_sub(1, Ordering::Relaxed);
+                        let count = connected_peer_count.load(Ordering::Relaxed);
+                        let addresses = pool.get_connected_addresses().await;
+                        let best_height = pool.get_best_height().await;
+                        let _ = network_event_sender.send(NetworkEvent::PeerDisconnected {
+                            address: addr,
+                        });
+                        let _ = network_event_sender.send(NetworkEvent::PeersUpdated {
+                            connected_count: count,
+                            addresses,
+                            best_height,
+                        });
+                    }
                 }

This requires capturing network_event_sender into the maintenance loop closure (it isn't currently captured).

🤖 Prompt for AI Agents
In `@dash-spv/src/network/manager.rs` around lines 890 - 893, The maintenance loop
calls pool.remove_peer(&addr) on ping-timeout peers but doesn't decrement the
cached connected_peer_count or emit network events, causing counter drift and
stale event state; modify the maintenance loop closure to capture
network_event_sender, and after successful removal invoke the same cleanup path
used in start_peer_reader: call connected_peer_count.fetch_sub(1,
Ordering::SeqCst) (or the existing decrement helper), send
NetworkEvent::PeerDisconnected with the peer id and then send
NetworkEvent::PeersUpdated so consumers stay in sync; ensure you reference
pool.remove_peer, connected_peer_count, network_event_sender, and mirror the
logic from start_peer_reader to avoid double-removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant